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ABSTRACT: Coronaviruses (CoVs) infect humans and multiple other animal species, causing highly prevalent and 
severe diseases. 3C-like proteases (3CL pro s) from CoVs (also called main proteases) are essential for viral replication 
and are also involved in polyprotein cleavage and immune regulation, making them attractive and effective targets 
for the development of antiviral drugs. Herein, the 3CL pro from the porcine epidemic diarrhea virus, an entero- 
pathogenic CoV, was used as a model to identify novel crucial residues for enzyme activity. First, we established a 
rapid, sensitive, and efficient luciferase-based biosensor to monitor the activity of PDEV 3CL pro in vivo . Using this 
luciferase biosensor, along with confirming the well-known catalytic residues (His41 and Cysl44), we identified 4 
novel proteolytically inactivated mutants of PDEV 3CL pro , which was also confirmed in mammalian cells by 
biochemical experiments. Our molecular dynamics (MD) simulations showed that the hydrogen bonding interac¬ 
tions occurring within and outside of the protease's active site and the dynamic fluctuations of the substrate, 
especially the van der Waals contacts, were drastically altered, a situation related to the loss of 3CL pro activity. These 
data suggest that changing the intermolecular dynamics in protein-substrate complexes eliminates the mechanism 
underlying the protease activity. The discovery of novel crucial residues for enzyme activity in the binding pocket 
could potentially provide more druggable sites for the design of protease inhibitors. In addition, our in-depth study 
of the dynamic substrate's envelope model using MD simulations is an approach that could augment the discovery 
of new inhibitors against 3CL pro in CoVs and other viral 3C proteases.—Zhou, J., Fang, L., Yang, Z., Xu, S., Lv, M., 
Sun, Z., Chen, J., Wang, D., Gao, J., Xiao, S. Identification of novel proteolytically inactive mutations in coronavirus 
3C-like protease using a combined approach. FASEB J. 33, 000-000 (2019). www.fasebj.org 
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Coronaviruses (CoVs) are important pathogens capable of 
causing severe, fatal, and highly prevalent diseases in hu¬ 
mans and other animals (1, 2). Since the outbreak of severe 
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acute respiratory syndrome (SARS) CoV in 2003 (3) and the 
outbreak of Middle East respiratory syndrome CoV in 2012 
(4, 5), CoVs have attracted more and more attention. CoVs 
are prone to genetic mutation, bringing about new variants 
and the reemergence of old ones. For example, porcine epi¬ 
demic diarrhea virus (PEDV), a swine enteropathogenic 
CoV that causes lethal watery diarrhea in piglets, was first 
identified in the early 1970s (6). PEDV reemerged in 2010, 
with a large-scale outbreak in China that rapidly spread to 
the United States and other countries, resulting in enormous 
economic losses to the global pig farming industry (7). In 
addition, this emerging PEDV variant possesses the poten¬ 
tial to infect humans, thereby posing a significant threat to 
public health (8). Although vaccines against PEDV have 
been developed, the continuous emergence of new serotypes 
and recombination events between field and vaccine strains 
mean that vaccination is only partially successful (9,10). 

CoVs 3C-like protease (3CL pro ), which are also referred 
to as the main protease in these viruses, are encoded by 
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nonstructural protein 5 and are essential for viral replica¬ 
tion. 3CL pro in Co Vs share highly conserved substrate 
recognition pockets, which are responsible for cleaving the 
viral polyprotein and the host factors involved in the in¬ 
nate immune response, including the signal transducer 
and activator of transcription 2 and the NF-kB essential 
modulator signaling protein (11-14). Thus, targeting 
3CL pro serves as a 2-pronged attack on the virus by 
preventing viral maturation and restoring the natural 
immune response. One strategy used in the rational 
design of protease inhibitor drugs is to exploit the inter¬ 
actions occurring in the protease's active site, an approach 
mainly based on the in-depth study of the substrate-active 
site interaction. Furthermore, such inhibitors are often 
designed to be in close proximity with the catalytic resi¬ 
dues in the protease active site to avoid drug resistance 
(14-16). Hence, the discovery of more crucial residues in 
protease active site could theoretically potentiate potential 
druggable sites. A series of inhibitors was reported to act 
against 3CL pro from Co Vs to prevent viral replication since 
the SARS outbreak in 2003 (17-22). Nevertheless, a theo¬ 
retical understanding of proteases, substrate-active site 
interactions, and high-level resistance to protease inhibi¬ 
tors in viruses is not yet fully developed. Therapeutic op¬ 
tions and treatment outcomes for patients infected with 
HIV or hepatitis C virus (HCV) have greatly benefited 
from structure and molecular dynamics (MD)-based drug 
design approaches, specifically with respect to viral pro¬ 
tease inhibitor development. Moreover, the dynamic 
substrate envelope model with MD simulations has clearly 
explained the molecular mechanism of drug resistance in a 
clinically significant variant of the HCV and HIV prote¬ 
ases. Unlike the static information gained from crystal 
structures, the MD simulations in several studies permit¬ 
ted a detailed analysis of the interaction network, in terms 
of the direct interactions with substrate within the active 
site and the internal electrostatic network throughout the 
enzyme, both of which are reportedly critical requirements 
for tight substrate binding (14, 23-28). 

When considering the development of protease inhib¬ 
itors, the most important criterion is the ability to detect 
protease activity. However, the traditional methods often 
involve protein purification and enzyme activity in¬ 
hibition experiments in vitro , which are inefficient 
and cannot meet the requirements of high-throughput 
screening in vivo. Therefore, there is an urgent need for a 
simple, efficient, and high-throughput method to detect 
protease activity at the cellular level in order to fully 
reflect the biologic characteristics of a protease. As a re¬ 
porter protein, firefly luciferase is widely used to detect 
apoptosis and enzyme activity and is also used to screen 
for antiapoptotic drugs and identify enzyme recognition 
sequences. In theory, a firefly luciferase reporter-based 
approach would also allow for the identification and 
screening of the specific amino acids affecting the activity 
of a viral protease (29-32). 

Consequently, in this study, we developed a combined 
strategy to identify novel proteolytically inactive mutants 
of a viral protease. Using PEDV 3CL pro as the model, we 
established a luciferase-based biosensor to monitor pro¬ 
tease activity in cells and identified 4 novel amino acids 


essential for the activity of the PEDV 3CL pro (Trp31, 
Phel39, Glyl42, and Hisl62). MD simulations were also 
performed on the wild-type (WT) or single-substitution 
variants of 3CL pro to calculate the dynamic substrate en¬ 
velopes. In agreement with the experimental loss in pro¬ 
tease activity, the single-substitution mutants (W31A, 
F139A, G142A, and H162A) were seen to significantly 
disrupt the intermolecular hydrogen bonding network 
and intermolecular dynamic correlations for the active 
sites, thus affecting the intermolecular hydrogen bond 
network and the substrate binding affinity. Our results 
explain the potential molecular basis whereby the 3CL pro 
mutants were proteolytically inactivated, thereby pro¬ 
viding more potential target sites for drug design. 

MATERIALS AND METHODS 
Plasmids 

The cDNA expression construct that encodes PEDV 3CL pro 
and the luciferase reporter plasmids (233D and 358D) were 
previously described in refs. 13 and 33. The cDNA expression 
construct encoding PEDV 3CL pro was PCR amplified and 
cloned into the C-terminal hemagglutinin (HA) tag-encoding 
pCAGGS-HA-C plasmid. First, secondary structures in CoV 
3CL pro were analyzed usingESPript (j http://espript.ibcp.fr/ESPript/ 
ESPript/index.php). Then, 7 aa sites were chosen based on the 
predicted amino acid interactions with an online method (https:// 
mistic2.leloir.org.ar/). Mutagenesis of the PEDV 3CL pro constructs 
(to produce W31 A, C38A, H41 A, F139A, G142A, C144A, G145A, 
Y160A, and H162A) was carried out by overlapping extension 
PCR using specific mutagenic primers. Luciferase reporter plas¬ 
mids (233D and 358D), which contain oligonucleotides corre¬ 
sponding to ENLYFQ1YS [cleaved by tobacco etch virus (TEV) 
3C pro ], were used as the reporter controls (32). The construction 
strategy for the luciferase-based biosensor plasmids (233DP and 
358DP) to monitor the activity of PDEV 3CL pro was as follows. 
The DNA sequences encoding the N- and C-terminal halves of 
the catalytic subunit a of DNA polymerase III (DnaE), which 
encompass the protein's trans -splicing activity, were synthesized 
and cloned into the pCAGGS-multiple cloning site (MCS) vector 
to construct pCAGGS-DnaE. The sequences corresponding to the 
N-terminal fragments (aa 4-233) and the C-terminal fragments 
(aa 235-544) of firefly luciferase were PCR amplified from the 
firefly luciferase reporter vector pGL4.211uc2P/Puro (Promega, 
Madison, WI, USA). The sequences encoding aa 4-233 and 
235-544, which were fused to the corresponding amino sequence 
YNSTLQf AGLRKM (the N-terminal auto-cleavage sequence in 
PEDV 3CL pro ), were cloned into pCAGGS-DnaE to create the 
233DP reporter. The same construction strategy was used to 
generate the 358DP reporter (Fig. 1 A). All the constructs were 
validated by DNA sequencing. 


Luciferase reporter gene assays 

Human embryonic kidney (HEK-293T) cells, obtained from the 
China Center for Type Culture Collection (Wuhan University, 
Wuhan, China), were cultured at 37°C in 5% C0 2 in DMEM 
(Thermo Fisher Scientific, Waltham, MA, USA) supplemented with 
10% fetal bovine serum. The luciferase reporter constructs (233DP 
and 358DP) and their controls (233D and 358D, respectively) were 
used to detect PEDV-specific 3CL pro activity. HEK-293T cells plated 
in 48-well plates were transfected with various 3CL pro expression 
plasmids or the empty control plasmid, together with the luciferase 
reporter plasmid and pRL-TK (Promega), which was used as an 
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Figure 1 . Exploiting the biosensor assay to evaluate PEDV 3CL pro activity in vivo. A) Diagram showing the generation of 233DP 
and 358DP constructs and their controls (233D and 358D, respectively). The blue structure represents the peptide sequence of 
the recombinant firefly luciferase. The black rectangle represents the Nostoc punctiforme (Npu) DnaE intein (DnaE I) peptide 
sequence used to cyclize the protein. The green rectangle represents the YNSTLQf AGLRKM protease recognition sequence that 
was used to assess PEDV 3CL pro activity, and the red rectangle represents the ENLYFQfYS protease recognition sequence for TEV 
3C pro , which was used as the control. B) HEK-293T cells were transfected with 233DP or 358DP, or the corresponding controls 
(233D and 358D, respectively) and the plasmid encoding PEDV 3CL pi °. After 30 h, cell lysates were prepared and analyzed by 
Western blotting. aHA, antihemagglutinin; IB, immunoblotting; luc, luciferase. 


internal control to normalize the transfection efficiency. At 36 h 
post-transfection, the cells were lysed, and a luciferase reporter 
assay system (Promega) was utilized to determine the luciferase 
activities in the lysed cells. The activities were normalized to the 
corresponding Renilla luciferase activities. 


Western blotting analysis 

Briefly, HEK-293T cells cultured in 60-mm dishes were trans¬ 
fected with the various plasmids. After 30 h, the cells were har¬ 
vested by adding lysis buffer, and the protein concentrations 
were measured in the whole cell extracts. The samples were 


resolved by SDS-PAGE and then transferred to PVDF mem¬ 
branes (MilliporeSigma, Burlington, MA, USA) to determine the 
protein expression levels. The membranes were then incubated 
with antibodies and secondary antibodies. The overexpression of 
PEDV 3CL pro WT and its distinct mutants was evaluated using 
an anti-HA antibody (Medical and Biological Laboratories, 
Nagoya, Japan). An anti-goat monoclonal secondary antibody 
(Promega) was used to analyze the expression level of each lu¬ 
ciferase reporter gene. An anti-|3-actin mouse monoclonal anti¬ 
body (Beyotime, Shanghai, China) was utilized to monitor 
P-actin's expression level to confirm that the protein loading was 
equal for the samples. The lane with the 100- and 70-kDa mo¬ 
lecular mass bands was revealed by protein markers (26616; 
Thermo Fisher Scientific). 
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MD simulation protocol 


Hydrogen bond calculations 


Because the dimer structure of 3CL pro from CoVs is necessary 
for enzyme activity (34-36), the protease model used in this 
study was built from the X-ray structure of a dimeric PEDV 
3CL pro mutant (C144A) bound to a peptide substrate (Protein 
Data Bank ID: 4ZUH; https://www.rcsb.org/). The C-terminal 
deletion PEDV 3CL pro in 4ZUH was substituted, and the 
substrate in the complex was replaced with YNSTL- 
QjAGLRKM (the N-terminal auto-cleavage sequence of 
PEDV 3CL pro ) using the software in the SYBYL-X program 
(v.2.0; https://omictools.com/sybyl-x-tool). For consistency, this 
crystal structure was used as the template for constructing the 
single mutant complexes by SYBYL-X. All water molecules in 
the crystal structure were retained. The Leap module in Am¬ 
ber 18 was used to add all of the missing hydrogen atoms (37). 
The ffl4SB Amber force field was used to assign bonded and 
nonbonded parameters to the protein and its peptide sub¬ 
strate (38). Each system was solvated with a 12 A shell of the 
transferable intermolecular potential with 3 points (also 
known as TIP3P) water in a truncated octahedron simulation 
box with periodic boundary conditions (39). Sodium (Na + ) or 
chloride (Cl - ) counterions were added to neutralize the 
overall charge of the system. 

For each complex, the MD simulations, which were collected 
for 50 ns using the Amber ffl4SB force field in Nanoscale Mo¬ 
lecular Dynamics (v.2.13; https://www.ks.uiuc.edu/Research/ 
namd/), were repeated 3 times (38,40). To relieve bad contacts 
and to direct each system toward energetically favorable 
conformations, each system was minimized using a 2-step, 
extensive energy minimization process based on the steepest 
descent method followed by the conjugate gradient algo¬ 
rithm. First, water molecules and counterions were relaxed 
by restraining the complex with a harmonic constant of 100 
kcal/mol-A - . Second, the restraint was removed to allow all 
of the atoms to move freely. After minimization, each system 
was gently heated from 0 to 310 K in 500 ps at a constant 
volume and equilibrated at 310 K for another 2 ns. Finally, a 
50 ns MD simulation without any restrictions was performed 
at constant pressure, and the coordinates of the atoms were 
saved every 5 ps. During the MD simulation, bonds involving 
hydrogens were constrained by the SHAKE algorithm, and a 
time step of 2 fs was adopted (41). The Langevin thermostat 
approach was employed to control the temperature with a 
collision frequency of 1.0 ps -1 (42). The particle mesh Ewald 
method was used to treat the long-range electrostatic inter¬ 
actions (43, 44), and the cutoff distances for the long-range 
electrostatic and van der Waals (vdW) interactions were set 
at 10 A. 

Analysis of the MD simulations 

Root mean square deviation calculations 

Root mean square deviation (RMSD) calculations were per¬ 
formed using the Visual Molecular Dynamics software package 
(45). The frames from each interval were aligned to the first frame 
of the trajectory, and the RMSD values were calculated using all 
of the backbone a carbon atoms. 


vdW contact potential calculations 

The vdW contact potential energy between the protease and its 
substrate was calculated over an MD trajectory and averaged 
using the molecular mechanics Poisson-Boltzmann surface area 
method. The values were averaged over 120 ns (i.e., the last three 
40 ns of each repetition system). 


The percentage of time that a hydrogen bond existed during a 
trajectory was calculated using the HBonds Plugin from Visual 
Molecular Dynamics and averaged over 120 ns (i.e., the last three 
40 ns of each repetition system) (45). A hydrogen bond was de¬ 
fined as having a donor-acceptor distance of a maximum of 3.5 A, 
where only the polar atoms (nitrogen, oxygen, sulfur, and fluo¬ 
rine) were involved. The donor-hydrogen acceptor angle was 
defined as being <40°. Hydrogen bonds were summed over each 
residue and substrate except when otherwise indicated. 


Cross-correlation analysis 

To explore the effect of residue mutation on the conformation and 
internal dynamics changes of the protease-substrate complex, the 
cross-correlation matrix elements Qj, which reflect the fluctuation 
of coordinates of the C a atoms relative to their mean positions, 
were calculated from the last 40 ns of the MD trajectory for each 
system using the following equation, where the angle brackets 
represent the mean times over the recorded snapshots: 



ARi indicates the fluctuation in the position vector R of site i, and 
ARj is the fluctuation in the position vector R of site j (46). A more 
positive Cij value represents a stronger correlated atomic fluctu¬ 
ation in the ith and jth residues. 


Statistical analysis 

The results are presented as the means ± sd of at least 3 experi¬ 
ments. Significant differences were detected using Student's t 
test. Values of P < 0.05 were considered statistically significant. 


RESULTS 

Exploiting the biosensor assay to evaluate 
PEDV 3CL pro activity in vivo 

To establish a firefly luciferase reporter system to monitor 
the activity of PEDV-3CL pro in mammalian cells, we used 
an inverted, cyclized recombinant firefly luciferase con¬ 
struct (pCAGGS-DnaE) separated by an engineered site 
corresponding to the N-terminal YNSTLQ | AGLRKM 
auto-cleavage sequence in PEDV 3CL pro . DnaE is widely 
used in protein cyclization because it improves the sensi¬ 
tivity of luciferase detection without affecting the lucifer- 
ase's activity (31, 47, 48). As shown in Fig. 1 A, the 
expressed N- and C-terminal fragments (233DP and 
358DP) were cyclized to restrict the movement of the 2 
domains in the presence of DnaE, which locked the en¬ 
zyme into a more inactive form. Upon cleavage by PEDV 
3CL pro , which recognizes the engineered cleavage site, the 
2 firefly luciferase domains could theoretically interact 
freely and change into an active form of the luciferase. To 
detect any nonspecific cleavage by PEDV 3CL pro , the 233D 
and 358D systems were fused to the ENLYFQ1YS se¬ 
quence, which is recognized by TE V 3C pro , and used as the 
controls for the corresponding proteins (Fig. 1 A). 
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To further determine whether the system fused to the 
N-terminal auto-cleavage sequence of PEDV 3CL pro was 
successfully recognized and cleaved by PEDV 3CL pro , 
HEK-293T cells were transfected with the PEDV 3CL pro 
expression plasmid, together with the reporter 233DP or 
358DP constructs, or the corresponding controls. Western 
blotting analyses showed that the protein bands from the 
cells cotransfected with PEDV 3CL pro and 233DP or 358DP 
migrated fastest. On account of the cyclization conferred 
by DnaE, 233DP and 358DP were linearized after cleavage, 
thereby possessing greater mobility and resulting in a 
slightly smaller sized product than the cyclized protein on 
a Western blot (Fig. IB). No cleavage activity was detected 
in the cells transfected with the 233D or 358D systems 
fused to the recognition sequence of TEV 3C pro (Fig. IB). 
These results confirm that the recombinant luciferase 
constructs fused to the N-terminal auto-cleavage se¬ 
quences of PEDV 3CL pro are specifically recognized and 
cleaved by PEDV 3CL pro , suggesting their potential utility 
in assessing the activity of PEDV 3CL pro in HEK-293T cells. 

Reliability of the cyclized luciferase-based 
biosensor (233DP) at detecting PEDV 3CL pro 
activity in mammalian cells 

To evaluate the function of the reporter in the luciferase 
activity assay, the PEDV 3CL pro expression plasmid, in 


addition to each of the reporters or their respective 
controls and Renilla luciferase plasmids, was trans¬ 
fected into HEK-293T cells. The cells were lysed at 36 h 
post-transfection, and a dual-luciferase assay was per¬ 
formed on the lysates. As shown in Fig. 2A, the activity 
of 233DP was markedly induced by PEDV 3CL pro , 
whereas that of the control reporter 233D remained 
low. Nevertheless, the background activity of 358DP 
was higher than that of 358D to some extent without the 
expression of PEDV 3CL pro (Fig. 2A), suggesting that 
the increased activities of the reporter luciferase might 
be nonspecific. These results show that the 233DP re¬ 
porter is a more sensitive and reliable biosensor assay 
for evaluating PEDV 3CL pro activity. To further verify 
the effect of 233DP in the luciferase activity assay, HEK- 
293T cells were transfected with different amounts of 
the PEDV 3CL pro expression plasmid and the 233D or 
233DP reporter. As shown in Fig. 2B, a dose-dependent 
response was evident, with increasing amounts of 
protease expression leading to higher luciferase activity 
levels. Western blotting also revealed that PEDV 3CL pro 
was able to cleave the recombinant firefly luciferase in a 
dose-dependent fashion, producing a faster migrating 
protein band (Fig. 2C). The consistency of the cleavage 
and the fold induction confirms that a correlation exists 
between the luciferase activity assay and reporter 
construct cleavage by PEDV 3CL pro . 
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Figure 2. Reporter 233DP reliably detects PEDV 3CL pro activity in cells. A) HEK-293T cells in 24-well plates were transfected with 
each of the 2 reporters or their corresponding controls, the pRL-TK plasmid, and the PEDV 3CL pro expression plasmid. 
Luciferase assays were performed 36 h post-transfection, ns, not significant. ****/>< 0.0001. B) HEK-293T cells were transfected 
with 233DP, pRL-TK, and various concentrations of the PEDV 3CL pro expression plasmid. The transfected cells were lysed for a 
dual-luciferase assay at 36 h post-transfection. C) HEK-293T cells cotransfected with PEDV 3CL pro and the 233DP expression 
plasmid. Cell lysates were prepared 30 h post-transfection and then subjected to Western blotting. aHA, antihemagglutinin; IB, 
immunoblotting; luc, luciferase. 
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Identifying the novel amino acid residues 
involved in PEDV 3CL pro activity 

CoV 3CL pro employs conserved cysteine and histidine 
residues (Cysl44 and His41 in the case of PEDV 3CL pro ) as 
the principal nucleophile and general acid-base catalyst, 
respectively, at its catalytic site (49-51). To screen for ad¬ 
ditional amino acids impinging on the activity of PEDV 
3CL pro , 7 aa sites (Trp31, Cys38, Phel39, Glyl42, Glyl45, 
Tyrl60, and Hisl62) were chosen because they are highly 
conserved in CoV 3CL pro (Fig. 3 A) and have a strong in¬ 
teraction network with other amino acids in 3CL pro (Fig. 
3 B) (52). The 233DP reporter system was used to access the 
protease activities of the single-substitution variants 
(W31A, C38A, F139A, G142A, G145A, Y160A, and 
H162A), with C41A and H144A variants used as the 
positive controls. As shown in Fig. 3C, WT PEDV 3CE pro 


and 3CE pro -C38A successfully induced luciferase activity, 
whereas the other mutants failed to activate the 233DP 
reporter. To explore the mechanism underlying the failure 
of the overexpressed 3CE pro mutants to induce reporter 
luciferase activity, WT PEDV 3CL pro or the 3CL pro mutant 
was overexpressed in the presence of the recombinant 
firefly luciferase 233DP reporter. The cyclized form of the 
recombinant firefly luciferase was cleaved normally by 
WT PEDV 3CL pro and 3CL pro -C38A, generating faster 
migrating protein bands on Western blots. However, no 
obvious cleavage products were observed when the other 
mutants were overexpressed. Interestingly, the protein 
abundance from the G145A and Y160A mutants was sig¬ 
nificantly reduced when compared with that of WT PEDV 
3CL pro (Fig. 3D). Unfortunately, it is difficult to determine 
whether a decrease in protein expression or protease ac¬ 
tivity leads to the occurrence of this phenomenon, because 
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Figure 3. Identifying the novel amino acid residues involved in PEDV 3CL pro activity. A) Amino acid alignment of the conserved 
region of the CoV 3CL pro (numbering is based on PEDV 3CL pro ). Secondary structural elements in PEDV 3CL pro are represented 
as Tj for 310 helix, arrows for (3-strands, and T for (3-turns. Residues conserved in all CoV 3CL pio s are presented in white on a red 
background. Conserved residues in most of the CoV 3CL pro s are presented in red and boxed with a white background. Blue 
arrows indicate the residues we selected. The sequences were derived from GenBank entries with the following accession 
numbers: PEDV, AF353511; Human_coronavirus_229E, AF304460; Human_Coronavirus_NL63, AY567487; Feline_infectious_ 
peritonitis, AY994055; Scotophilus_bat_coronavirus, DQ648858; Bat_coronavirus_HKU2, EF203064; Bat_coronavirus_lA, 
EU420138; Bat_coronavirus_HKU8, EU420139; Mink_coronavirus_strain_WD1127, HM245925; Rouse ttus_bat_coronavirus_ 
HKU10, JQ989270; Bat_coronavirus_CDPHE15USA2006, KF430219; BtMrAlphaCoV/SAX2011, KJ473806; BtRfAlphaCoV/ 
HuB2013, KJ473807; BtR£AlphaCoV/YN2012, KJ473808; BtNvAlphaCoV/SC2013, KJ473809; Ferret_coronavirus_isolate_FRCo, 
KM347965; Swine_enteric_coronavirus_strain, KR061459; Camel_alphacoronavirus_isolate, KT368907; NL63related_bat_coro- 
navirus_strain, KY073744; Wencheng_Sm_shrew_coronavirus, KY967717; Porcine_coronavirus_HKU15_strain, JQ065042; Avian_ 
infectious_bronchitis_virus, M95169; and Beta_PEAV, MG742313. B ) The predicted related amino acids are listed and connected 
with black solid lines. C) HEK-293 cells were transfected with 233DP and pRL-TK and with increasing quantities of plasmid 
encoding the WT or mutant PEDV 3CL pi °. The cells were harvested after 36 h and subjected to a dual-luciferase assay. D) HEK- 
293T cells were cotransfected with the WT or mutant PEDV 3CL pro and the 233DP expression plasmid. Cell lysates were prepared 
at 30 h post-transfection and analyzed by Western blotting. aHA, antihemagglutinin; IB, immunoblotting; luc, luciferase. 
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the mutations (G145A and Y160A) abrogated not only the 
catalytic activity but also the protein expression of PEDV 
3CL pro . Thus, the G145A and Y160A mutations, which 
both mediated a reduction in the catalytic activity of PEDV 
3CL pro , were not investigated further in this study. 

Hydrogen bond interactions of substrates 
vs. protease 

To further investigate the potential mechanism involving 
the single-substitution variant and in support of our 
experimental data, MD simulations were performed to 
investigate the dynamic mechanism used by the pro- 
teolytically inactive PEDV 3CL pro mutants. Based on the 
crystal structures of the PEDV-3CL pro complexes (Protein 
Data Bank identifier: 4ZUH) (53), 3 replicates of the 50 ns 
MD simulations were performed for each PEDV-3CL pro 
complex. In each simulation, the RMSD values of the Ca 
atoms during the simulation calculations converged and 
remained stable. In the simulations over the last 40 ns, the 
overall binding modes of the WT and mutant complexes 
were conserved when bound to the substrate (Supple¬ 
mental Fig. SI). As shown in the structure, only P4-P2' in 
the substrate fit comfortably in the active site, whereas the 
other residues floated out of the protease pockets (Fig. 4A, 
B). Then, we calculated the mean times for the hydrogen 
bonds during the MD simulations to better capture the 
intermolecular polar interactions. Overall, the hydrogen 
bonding network in the substrate packing was stably 
retained within the WT MD simulation. The most preva¬ 
lent hydrogen bonds in the active site are formed by 
Pl-Gln and residues Glyl42, Cysl44, Hisl62, or Glnl63 in 
the SI pocket, which may help to stabilize the substrate in 
the active site during the cleavage reactions (Fig. 4C and 
Supplemental Fig. S2). The Ns2 atom of Hisl62 and the 
Osl atom of Pl-Gln form a hydrogen bond, which is stable 
by the tt-tt stacking interactions between Phel39 and 
Hisl62. There is an oxyanion hole constituted by the main 
chain amides of Glyl42 and Cysl44 to stabilize the car¬ 
bonyl oxygen of Pl-Gln, which is reported to be critical for 
cleavage (Supplemental Fig. S3) (53). In addition, residue 
with stronger hydrogen bonds to the substrate is Glul65 
in the S4 pocket, whose backbone links tightly to P4-Ser 
and P3-Thr (Fig. 4C and Supplemental Fig. S2). The stable 
hydrogen bond network in the WT complex between the 
SI and S4 pockets is consistent with the crystal structure 
elucidated in previous studies, further verifying the 
importance of the SI and S4 pockets in substrate binding 
(14, 54). 

The Phel39, Glyl42, and Hisl62 residues in the pro¬ 
tease active site make direct hydrogen bonds with the 
substrate. Compared with the WT complex, the residue 
139 mutation resulted in no considerable difference in the 
hydrogen bonding between the main side of Phel39 and 
Pl-Gln. However, the loss of tt-tt stacking interactions 
between Phel39 and Hisl62 caused subtle rearrangements 
in the structure that resulted in decreased interactions in 
Hisl62—Pl-Gln and Glnl63-Pl-Gln (-35 and -25%, 
respectively) (Supplemental Fig. S4). Interestingly, the 
G142A substitution decreased the interactions not only at 


this position with Pl-Gln but also at the other active sites, 
especially Q163—Pl-Gln (—72%) (Fig. 4D and Supple¬ 
mental Figs. S4 and S5). Notably, the H162A variant was 
the most disrupted with 10 hydrogen bonds changing by 
>15% relative to the WT complex throughout the dimer, 
with 9 being weakened including most dramatically the 
interactions of the side chain of Hisl62 with Pl-Gln (a 96% 
reduction) (Fig. 4E). In addition to the mutants within the 
protease active site, a remote mutant site, W31 A, resulted 
in the loss of at least 2 intermolecular hydrogen bonds at 
the S4 pocket. As expected, the C38A substitution did not 
cause any further considerable changes in the active site 
relative to the WT complexes (Fig. 4D and Supplemental 
Fig. S5). Overall, the active site polymorphisms in the SI 
and S4 pockets severely disrupted the intermolecular hy¬ 
drogen bonding network in the active site as well as af¬ 
fecting the substrate binding. 

Differences in the activity of PEDV 3CL pro alter 
substrate packing 

In addition to the hydrogen bond interactions shown for 
the packed substrate at the active site, we calculated the 
vdW contact energies for the active site and substrate in 
each complex for more detail. The total vdW contact en¬ 
ergies were conserved between the WT and C38A com¬ 
plexes (—98.9 and —98.1 kcal/mol, respectively), but 
striking energy losses were evident for the W31A, F139A, 
G142A, and H162A complexes when compared with the 
WT value (—90.5, —91.5, —89.2, and —86.7 kcal/mol, re¬ 
spectively) (Fig. 5A and Table 1), a result consistent with 
the experimental loss in protease activity and the severe 
disruption of hydrogen bonding network (Fig. 4 and 
Supplemental Fig. S5). 

To quantify the interactions of the substrates with the 
individual active site residues, intermolecular vdW inter¬ 
actions over the MD trajectories for each residue at the 
active site were calculated. In line with the conserved 
overall binding mode, the strongest substrate-protease 
interaction occurred with the Met25 and Asnl41 residues 
in the WT complex (—3.72 and —3.25 kcal/mol, re¬ 
spectively). Compared with the WT protease, the contact 
energy landscape in the C38A complex was highly con¬ 
served, but disrupted in the W31A, F139A, G142A, and 
HI62A complexes with a conspicuous loss of interactions 
over 1.2 kcal/mol in the Met25 residue (Fig. 5B and Sup¬ 
plemental Fig. S6). Moreover, a considerable loss of con¬ 
tacts in the SI' pocket, in particular, Asn24 and Ala26, was 
evident. The decline of the vdW between SI' pocket and 
the substrate indicates that the C terminus of substrate in 
single mutant complexes became more disordered. 
According to the vdW equation, we calculated the distance 
between Met25 and the substrate and found that the P4'- 
Arg had markedly moved away from the Met25 residue in 
the W31A, F139A, G142A, and H162A mutants (Supple¬ 
mental Fig. S7). Interestingly, previous research has shown 
that the M25T single-substitution mutant failed to cleave 
an NF-kB essential modulator signaling-derived substrate 
(53), indicating that the Met25 residue plays a crucial role 
in substrate binding. 
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Figure 4. Hydrogen bond interactions of substrates vs. protease. A) Crystal structure of the substrate bound to the active site with 
only one protease monomer shown for clarity. The substrate is shown in blue. B) Diagram of the substrate envelope model. Only 
P4-P2' in the substrate tit nicely in the pockets of the active site. C) Histograms of the changes in the percentage of times that 
hydrogen bonds are formed relative to the WT simulation for each of the complexes. D) Schematic hydrogen bond network in 
the active site with the percentage of times hydrogen bonds are formed during the WT simulation. The dashed blue lines 
represent the hydrogen bond interactions. The substrate is shown from P4-Ser to P2'-Gly considering the substrate envelope 
model and previous research. E) Schematic representation of the H162A complex simulation with changes in hydrogen bonding 
relative to the WT simulations. The schematic for the remaining variants is shown in Supplemental Fig. S5. 


Loss of dynamic correlations during 
protease-peptide atomic fluctuations in 
the proteolytically inactive mutants 

In principle, tight binding substrates are characterized by 
strong intermolecular interactions with their cognate 


proteins, which persist over the dynamics of individual 
enzymes (25). Conservation of protease-inhibitor dynamic 
cross-correlations is often incorporated into the rational 
design and computational evaluation of protein inhibitors 
in structure-based drug design (26, 27). To further in¬ 
vestigate the coupling of atomic fluctuations between the 
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Figure 5. Differences in the 
activity of PEDV 3CL pro alter 
substrate packing. A) Histo¬ 
grams of the total vdW con¬ 
tact energies of the 6 
complexes. B) The vdW con¬ 
tact potentials averaged 
from MD simulations of pro¬ 
tease active site residues for 
the substrate bound to WT, 
W31A, C38A, F139A, G142A, 
and H162A proteases, re¬ 
spectively. The warmer 
(red) and cooler (blue) col¬ 
ors indicate more and less 
contact with the substrate, 
respectively. Single-substitu¬ 
tion residues are highlighted 
in red. Trp31 and Cys38, 
outside the active site, are 
not marked. See also Sup¬ 
plemental Fig. S6. 


active site surface of the protease and the substrate, we 
calculated the cross-correlation coefficients between the 
atomic fluctuations of the protease's backbone and the 
peptide substrate's atoms. On the basis of our binding 
model and previous research, only P4-P2' in the sub¬ 
strate are shown in the results (14,53). As shown in Fig. 
6A, the protease active site is mainly composed of 4 
pockets, S4, S2, SI, SI', which correspond to P4, P2, PI, 
PI' sites in the substrate, respectively. In the WT com¬ 
plex, the dynamics of the substrate were highly corre¬ 
lated with the motions of the residues in the protease's 
active site. In addition to the conserved Sl-Pl and S4-P4 
interaction revealed by the hydrogen bonding interac¬ 
tions, this coupling was the most pronounced for the 
162-165 active site residues, displaying correlations 


TABFE 1. vdW interactions between protease and substrate in WT 
and mutant complexes 


System 

A/Tdw (kcal/mol) 

AAGdw (kcal/mol) 

WT 

-98.9 

— 

W31A 

-90.5 

+8.4 

C38A 

-98.1 

+0.8 

FI 39 A 

-91.5 

+7.4 

G142A 

-89.2 

+9.7 

H162A 

-86.7 

+12.2 


with most of the substrate's moieties. Additionally, the 
dynamics of the PI-Gin moiety of the substrate were 
highly coupled with the dynamics of residue Glyl42 
and the Cysl44 catalytic site. Neither of these correla¬ 
tions changed when Cys38 was mutated to an alanine, 
consistent with the stabilization of the intramolecular 
dynamics in the C38A complexes (Fig. 6 B). 

In contrast, there was a considerable loss of correlation 
between P2-Leu, P3-Thr, or P4-Ser and residues 162-167 in 
both the W31A and F139A complexes. However, the re¬ 
duction in the FI 39A complex was much more serious 
than that in the W31A complex. The H162A substitution 
severely reduced the dynamic coupling of the substrate to 
the protease's active site. In addition to the striking loss of 
P2-Leu, P3-Thr, or P4-Ser with residues 162-167, the cor¬ 
relation between Pl-Gln and the catalytic residue Cysl44 
was severely reduced (Fig. 6 B). The disrupted correlations 
for the pocket residues for SI and S4 agree with the ob¬ 
served loss of intermolecular interactions for Pl-Gln and 
P4-Ser in this substrate. Interestingly, in the G142A pro¬ 
tease, the loss of hydrogen bond between Cysl42 and 
the carbonyl oxygen of Pl-Gln undermined the stability 
of the oxyanion hole, leading to complete disruption of 
substrate-active site correlations (Figs. 4 and 6 B). Overall, 
the loss of coupling between substrate and protease dy¬ 
namics might be found to be correlated with reduced 
protease activity against the single mutant variant. 
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Figure 6. Protease-substrate dynamic coupling of an oligopeptide bound to WT, W31A, C38A, F139A, G142A, and H162A 
proteases. A) Diagram of the binding pocket in the active site. SI': Asn24, Met25, Ala26, Leu27, His41; SI: Phel39, Ilel40, 
Asnl41, Glyl42, Alal43, Cysl44, Glyl45, Hisl62, Glnl63; S2: Ile51, Aspl86, Glnl87, Prol88; S4: Leul64, Glul65, Leul66, Glyl67, 
Leul90, Glnl91. B ) Cross-correlations between atomic fluctuations of protease active site residues and substrate in different 
complexes. Warm colors in the matrices indicate increased correlations. Residues are colored on the surface to indicate their 
locations in the active site. 


DISCUSSION 

With their pivotal roles in the multiplication and pro¬ 
liferation of Co Vs, 3CL pro are recognized as the major 
targets of protein inhibitors in anti-CoV therapies (14, 21, 
35). Developing assays that efficiently detect the activity 
of proteases in Co Vs is a key step toward the goal of 
screening for specific PEDV 3CL pro inhibitors or broad- 
spectrum inhibitors of CoV proteases. Thus, analyzing 
3CL pro activity in live cells with an efficient, high-throughput 
strategy is critical to moving this field forward. In the present 
study, we developed a luciferase-based protease activity 
biosensor, which contained DnaE and the N-terminal 
auto-cleavage sequences of PEDV 3CL pro . DnaE was used 
to cyclize the 2 domains of firefly luciferase to generate a 
cyclized recombinant firefly luciferase without affecting the 
activities (31,47,48). With a circular permuted luciferase, the 
movement of the 2 firefly luciferase domains is restricted, 
locking the enzyme into its less-active open form (32). This 


underpins the high sensitivity and low background of the 
233DP reporter system (Fig. 2B), confirming the outstanding 
prospects of the 233DP reporter in the analysis of PEDV 
3CL pro activity. 

Thorough elucidation of substrate-active site interac¬ 
tions is crucial for rational drug design, and further im¬ 
provements in this area are needed if broader potencies are 
to be achieved (26, 27, 55-57). Following the SARS out¬ 
break, a series of inhibitors that prevent viral replication 
were reported to act against the 3CL pro of SARS-CoV 
(17-19). However, the binding mode of the substrate and 
active site was merely described for several crystal struc¬ 
tures. Previous studies have shown that therapeutic efforts 
against HCV and HIV have greatly benefited from 
MD-based drug design, specifically in developing viral 
protease inhibitors (23-27). Our MD analysis has revealed 
the potential structural mechanism for substrate binding 
in more detail. The side chain of the conserved Pl-Gln fits 
comfortably in the SI pocket, stabilized by a hydrogen 
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bonding network, whereas the prevalent hydrogen bonds 
of P3-Thr and P4-Ser are with residues Glul65 and Glnl91 
in the S4 pocket. Notably, we discovered the dynamic 
cross-correlations with the target active site for sub¬ 
strate binding, which revealed the strong positive in¬ 
terdependency on PI-SI and P4-S4. We further explored 
the protein surface of PEDV 3CL pro using a variety of 
small "probe" molecules by FTMap (58), which showed 
that the druggable sites in PEDV 3CL pro comprise a cluster 
of binding hot spots in the SI and S4 pockets (Supplemental 
Fig. S8). Taken together, the intermolecular hydrogen 
bonding network and intermolecular dynamic correlations 
further confirm that the SI and S4 pockets are key to sub¬ 
strate recognition and comprise the ideal target for drug 
design. 

More importantly, the high replication rate and high 
mutation frequency that occur during the replication of 
viruses such as SARS-CoV and PEDV (9, 10, 59, 60) 
mean that every possible nonsense mutation is likely 
introduced into the viral genome on a daily basis, 
which may lead to resistance-associated substitutions 
in the target proteins. CoV 3CL pro employs conserved 
cysteine and histidine residues in SI pocket as the 
principal nucleophile and general acid-base catalyst, 
respectively, at its catalytic site (49-51). One strategy 
used to avoid drug resistance in the rational drug de¬ 
sign of protein inhibitors tends to stack on the catalytic 
residues and have direct physical interactions with 
them. The catalytic dimer residues are critical for the 
biologic function of the protease and thus are almost 
always invariant (27). We screened 7 aa through in¬ 
teraction networks and the conservation of PEDV 
3CL pro . Four of them, the W31A, F139A, G142A, and 
HI62A mutants, were incapable of activating the 
luciferase-based biosensor. The irreplaceability of these 
4 sites in PEDV 3CL pro for the N-terminal cleavage of 
PEDV 3CL pro indicated that they are critical for the 
maturation of this protease. Specifically, Phel39, 
Glyl42, and Hisl62 are within the protease active site. 
These 3 mutations (F139A, G142A, H162A) un¬ 
favorably alter or completely disrupt the active site's 
intermolecular network of hydrogen bonds and the 
intermolecular dynamic correlations in SI and S4 
pockets required for tight substrate binding, further 
leading to the loss of protease activity. Interestingly, 
although the Trp31 is out of the active site and away 
from binding surface for dimerization, the distance 
between Trp31 and SI' pocket is close (Supplemental 
Fig. S9). The single substitution might utilize a common 
mechanism or pathway for altering the protease- 
substrate interactions (23). This mutation (W31A) 
might cause subtle but significant rearrangements in 
the structure resulting in altered interactions with the 
bound substrate, as well as impacting the dynamic as¬ 
semblies of the complexes; these changes were partic¬ 
ularly obvious in the MD analysis at the SI and S4 
pockets. 

Here, Phel39, Glyl42, and Hisl62 are crucial for pro¬ 
tease activity and further irreplaceable for viral replication. 
More importantly, these 3 aa locate on the surface of SI 
pocket. Their side chains or main chains are potential sites 


for hydrogen bond interactions formed during the binding 
of small molecular drugs. These characteristics give them 
the potential to be target sites for the newer-generation 
inhibitors to work with and conquer drug resistance. 
Furthermore, we incorporated conformational dynamics 
into a dynamic substrate envelope model using MD sim¬ 
ulations and compared the WT complex with various 
mutant complexes to effectively explain the mechanism 
involved in the substrate-active site interactions, some¬ 
thing that will be necessary for the design of protein in¬ 
hibitors and for improving the potency of these inhibitors 
against any emerging inhibitor-resistant variants. The 
3C pro or 3CL pro encoded by viruses exist in plus-stranded 
RNA viruses and double-stranded RNA viruses (61). 
Therefore, the method established in the present study can 
be used as a reference for other viruses. 

In summary, the 233DP reporter system we developed 
in this study is an improvement over other biosensors in 
that it is sensitive and yields reproducible data. We 
screened 4 proteolytically inactive mutants using this 
method and further elucidated the potential molecular 
mechanism by MD analysis. The proteolytically crucial 
sites in the active site might enrich the target residue for 
drug design. We also confirmed the importance of the SI 
and S4 pockets in substrate binding. The MD simulations 
of the PEDV WT 3CL pro and its mutant complexes enabled 
a detailed analysis of the protease-substrate binding 
model to be undertaken, which is critical for the rational 
design of protease inhibitors. Overall, our successful de¬ 
velopment of a luciferase-based biosensor for measuring 
protease activity will facilitate the screening and identifi¬ 
cation of effective protease inhibitors against PEDV 3CL pro 
and future emerging Co Vs. gj] 
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